Fast evaluation of Helmholtz potential on graphics processor units ( GPUs )

نویسندگان

  • Shaojing Li
  • Boris Livshitz
  • Vitaliy Lomakin
چکیده

Non-uniform grid method (NGM) is a fast algorithm to accelerate the integral equation based method of static and dynamic field evaluation in various areas such as electromagnetics, optics, magnetics etc. The NG method reduce the computational complexity of direct evaluation of interaction between N unknowns from 2 ( ) O N to ( ) O N in static and low-frequency regime and ( log ) O N N in the high-frequency regime. Compare with other fast algorithm like FMM, the NGM also provides adaptivity to sparse problems, mixed frequency problems and problems with reduced dimension or non-uniform geometries. Graphic processing units (GPU) as an emerging hardware platform to computational science community provide researchers cluster-level computational power at an exceptionally cheap price and remarkable flexibility. As the NGM relies on spatial integral kernel smoothing and interpolations combined with hierarchal domain decomposition to provide its remarkable speed and adaptivity, it is not readily to be migrated on to GPU platform. The NGM code should be parallelized and sometimes rewritten completely from another perspective in order to utilize all computational resources and detour the weak points of GPU which are originally expected to handle graphic applications. After significant parallelization and modification are made, our GPU implementation illustrated in this paper can handle problem with up to 16 millions of unknowns and under various accuracy up to 120 times faster than CPU version of NGM and a few million times faster compared with direct computation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Reference [1] L. Greengard and V. Rokhlin, J. Comput. Phys., vol. 73, pp. 325-348, 1987. [2] N. A. Gumerov and R. Duraiswami, J. Comput. Phys., vol. 227, pp. 8290-8313, 2008. [3] A. Boag and E. Michielssen, A. Brandt, IEEE Antennas and Wireless Propagation Letters, vol. 1, pp.142-145, 2002 [4] A. Boag and B. Livshitz, IEEE Trans. on MTT, vol. 54, pp. 3365-3570, 2006. [5] A. Boag, V. Lomakin, an...

متن کامل

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

An Implementation of Low-Frequency Fast Multipole BIEM for Helmholtz’ Equation on GPU

Acceleration of the fast multipole method (FMM), which is the fast and approximate algorithm to compute the pairwise interactions among many bodies, with graphics processing units (GPUs) has been investigated for the last couple of years. In view of the type of kernel functions, the non-oscillatory kernels (especially, the Laplace kernel) were studied by many researchers (e.g. Gumerov), and the...

متن کامل

Hardware Acceleration for CGP: Graphics Processing Units

Graphic Processing Units (GPUs) are fast, highly parallel units. In addition to processing 3D graphics, modern GPUs can be programmed for more general-purpose computation. A GPU consists of a large number of ‘shader processors’, and conceptually operates as a single instruction multiple data (SIMD) or multiple instruction multiple data (MIMD) stream processor. A modern GPU can have several hund...

متن کامل

IfI - 06 - 11 Clausthal - Zellerfeld 2006

In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is als...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009